Overview

Dataset info

Number of variables9
Number of observations3589048
Missing cells0 (0.0%)
Duplicate rows48 (< 0.1%)
Total size in memory246.4 MiB
Average record size in memory72.0 B

Variables types

Numeric7
Categorical2
Boolean0
Date0
URL0
Text (Unique)0
Rejected0
Unsupported0

Warnings

Dataset has 48 (< 0.1%) duplicate rows Warning
dropoff_datetime only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
dropoff_datetime has a high cardinality: 3339482 distinct values Warning
dropoff_latitude is highly skewed (γ1 = -26.15938649) Skewed
dropoff_longitude is highly skewed (γ1 = 26.19390567) Skewed
pickup_datetime only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
pickup_datetime has a high cardinality: 3340587 distinct values Warning
pickup_latitude is highly skewed (γ1 = -24.71643819) Skewed
pickup_longitude is highly skewed (γ1 = 24.74925192) Skewed
total_amount is highly skewed (γ1 = 1416.592082) Skewed
trip_distance has 51452 (1.4%) zeros Zeros

Variables

dropoff_datetime
Categorical

Distinct count3339482
Unique (%)93.0%
Missing (%)0.0%
Missing (n)0
2014-08-10 14:55:38
 
31
2015-04-20 00:00:00
 
12
2015-02-13 00:00:00
 
11
Other values (3339479)
3588994
ValueCountFrequency (%) 
2014-08-10 14:55:38 31 < 0.1%
 
2015-04-20 00:00:00 12 < 0.1%
 
2015-02-13 00:00:00 11 < 0.1%
 
2015-03-02 00:00:00 9 < 0.1%
 
2015-04-12 00:00:00 9 < 0.1%
 
2015-02-08 00:00:00 9 < 0.1%
 
2015-04-26 00:00:00 8 < 0.1%
 
2015-03-29 00:00:00 8 < 0.1%
 
2015-03-08 00:00:00 8 < 0.1%
 
2015-06-01 00:00:00 8 < 0.1%
 
Other values (3339472) 3588935 > 99.9%
 
Max length19
Mean length19
Min length19
Contains charsFalse
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

dropoff_latitude
Numeric

Distinct count88454
Unique (%)2.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean40.69178856
Minimum0
Maximum43.16053772
Zeros (%)0.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile40.66179276
Q140.70624542
Median40.7504425
Q340.79576874
95-th percentile40.84837341
Maximum43.16053772
Range43.16053772
Interquartile range0.0895233154

Descriptive statistics

Standard deviation1.550997685
Coef of variation0.03811574127
Kurtosis683.3172003
Mean40.69178856
MAD0.1265361092
Skewness-26.15938649
Sum146044782.3
Variance2.405593818
Memory size27.4 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 0. 14.78769493 36.9221077 39.94412231 40.51776505 ... 41.05973816 41.11589432 41.19598007 41.52406502 43.16053772], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5199 0.1%
 
40.77432632 327 < 0.1%
 
40.77430344 324 < 0.1%
 
40.80513382 321 < 0.1%
 
40.80515671 305 < 0.1%
 
40.80512619 302 < 0.1%
 
40.80511856 302 < 0.1%
 
40.77428818 299 < 0.1%
 
40.7743187 297 < 0.1%
 
40.80514145 297 < 0.1%
 
Other values (88444) 3581075 99.8%
 

Minimum 5 values

ValueCountFrequency (%) 
0 5199 0.1%
 
29.57538986 1 < 0.1%
 
29.59758186 1 < 0.1%
 
36.09853363 1 < 0.1%
 
37.74568176 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
43.16053772 1 < 0.1%
 
42.89497757 1 < 0.1%
 
42.76523972 1 < 0.1%
 
42.7485733 1 < 0.1%
 
42.67972565 1 < 0.1%
 

dropoff_longitude
Numeric

Distinct count45682
Unique (%)1.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean-73.82657599
Minimum-122.3996277
Maximum0
Zeros (%)0.1%
Mini histogram

Quantile statistics

Minimum-122.3996277
5-th percentile-73.99716949
Q1-73.96717834
Median-73.94400024
Q3-73.90833092
95-th percentile-73.83304596
Maximum0
Range122.3996277
Interquartile range0.05884742735

Descriptive statistics

Standard deviation2.812673762
Coef of variation-0.03809839106
Kurtosis684.6239492
Mean-73.82657599
MAD0.2166120365
Skewness26.19390567
Sum-264967124.9
Variance7.911133694
Memory size27.4 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[-122.39962769 -83.99078369 -75.31510162 -75.30848694 -74.50344849 ... -73.39675903 -73.02417374 -70.91507339 -14.09166622 0. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5199 0.1%
 
-73.95274353 629 < 0.1%
 
-73.93916321 571 < 0.1%
 
-73.95272827 567 < 0.1%
 
-73.95276642 565 < 0.1%
 
-73.9393158 544 < 0.1%
 
-73.95278931 534 < 0.1%
 
-73.93917847 531 < 0.1%
 
-73.93932343 528 < 0.1%
 
-73.93914032 527 < 0.1%
 
Other values (45672) 3578853 99.7%
 

Minimum 5 values

ValueCountFrequency (%) 
-122.3996277 1 < 0.1%
 
-115.1458664 1 < 0.1%
 
-84.55179596 1 < 0.1%
 
-83.42977142 1 < 0.1%
 
-81.24718475 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
0 5199 0.1%
 
-28.18333244 1 < 0.1%
 
-70.91316986 1 < 0.1%
 
-70.91697693 1 < 0.1%
 
-70.92022705 1 < 0.1%
 

passenger_count
Numeric

Distinct count10
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean1.404418665
Minimum0
Maximum9
Zeros (%)< 0.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile1
Q11
Median1
Q31
95-th percentile5
Maximum9
Range9
Interquartile range0

Descriptive statistics

Standard deviation1.094671723
Coef of variation0.7794482874
Kurtosis7.721451287
Mean1.404418665
MAD0.6746373892
Skewness2.946986301
Sum5040526
Variance1.198306181
Memory size27.4 MiB
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 3.5 4.5 5.5 6.5 8.5 9. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 2990459 83.3%
 
2 266574 7.4%
 
5 159314 4.4%
 
3 86323 2.4%
 
6 59417 1.7%
 
4 25901 0.7%
 
0 894 < 0.1%
 
8 80 < 0.1%
 
7 70 < 0.1%
 
9 16 < 0.1%
 

Minimum 5 values

ValueCountFrequency (%) 
0 894 < 0.1%
 
1 2990459 83.3%
 
2 266574 7.4%
 
3 86323 2.4%
 
4 25901 0.7%
 

Maximum 5 values

ValueCountFrequency (%) 
9 16 < 0.1%
 
8 80 < 0.1%
 
7 70 < 0.1%
 
6 59417 1.7%
 
5 159314 4.4%
 

pickup_datetime
Categorical

Distinct count3340587
Unique (%)93.1%
Missing (%)0.0%
Missing (n)0
2014-08-09 15:54:25
 
31
2014-06-07 00:00:00
 
17
2014-07-05 00:00:00
 
13
Other values (3340584)
3588987
ValueCountFrequency (%) 
2014-08-09 15:54:25 31 < 0.1%
 
2014-06-07 00:00:00 17 < 0.1%
 
2014-07-05 00:00:00 13 < 0.1%
 
2014-07-12 00:00:00 13 < 0.1%
 
2014-06-06 00:00:00 13 < 0.1%
 
2014-06-22 00:00:00 12 < 0.1%
 
2014-06-01 00:00:00 12 < 0.1%
 
2014-04-30 00:00:00 11 < 0.1%
 
2014-05-29 00:00:00 11 < 0.1%
 
2014-05-30 00:00:00 11 < 0.1%
 
Other values (3340577) 3588904 > 99.9%
 
Max length19
Mean length19
Min length19
Contains charsFalse
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

pickup_latitude
Numeric

Distinct count77041
Unique (%)2.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean40.68698907
Minimum0
Maximum42.78691864
Zeros (%)0.2%
Mini histogram

Quantile statistics

Minimum0
5-th percentile40.67240906
Q140.70282364
Median40.74771118
Q340.8049202
95-th percentile40.8453331
Maximum42.78691864
Range42.78691864
Interquartile range0.1020965576

Descriptive statistics

Standard deviation1.641373779
Coef of variation0.04034149041
Kurtosis609.6833054
Mean40.68698907
MAD0.1367990944
Skewness-24.71643819
Sum146027556.7
Variance2.694107883
Memory size27.4 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 0. 14.79031181 39.95358658 40.51891136 40.56499481 ... 40.93572044 40.97490692 41.07374001 41.17525101 42.78691864], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5824 0.2%
 
40.72135162 1504 < 0.1%
 
40.72133636 1379 < 0.1%
 
40.72136688 1339 < 0.1%
 
40.72135544 1237 < 0.1%
 
40.72137833 1213 < 0.1%
 
40.72134018 1153 < 0.1%
 
40.7213707 1082 < 0.1%
 
40.72132874 1081 < 0.1%
 
40.72134781 1032 < 0.1%
 
Other values (77031) 3572204 99.5%
 

Minimum 5 values

ValueCountFrequency (%) 
0 5824 0.2%
 
29.58062363 1 < 0.1%
 
29.60992432 1 < 0.1%
 
36.08561325 1 < 0.1%
 
37.74568176 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
42.78691864 1 < 0.1%
 
42.74856567 1 < 0.1%
 
42.67842865 1 < 0.1%
 
42.66678238 1 < 0.1%
 
42.64542389 1 < 0.1%
 

pickup_longitude
Numeric

Distinct count36181
Unique (%)1.0%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean-73.81414566
Minimum-122.3996201
Maximum0
Zeros (%)0.2%
Mini histogram

Quantile statistics

Minimum-122.3996201
5-th percentile-73.9907074
Q1-73.95853424
Median-73.94424438
Q3-73.91512299
95-th percentile-73.84429932
Maximum0
Range122.3996201
Interquartile range0.0434112549

Descriptive statistics

Standard deviation2.97637456
Coef of variation-0.04032254973
Kurtosis610.8643406
Mean-73.81414566
MAD0.239971874
Skewness24.74925192
Sum-264922511.9
Variance8.858805519
Memory size27.4 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[-122.39962006 -77.50876999 -75.31520844 -75.30993652 -74.49471283 ... -73.41581345 -73.03588867 -70.91751099 -35.45825577 0. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5824 0.2%
 
-73.84429932 2283 0.1%
 
-73.84429169 2097 0.1%
 
-73.84427643 2089 0.1%
 
-73.8442688 1900 0.1%
 
-73.84430695 1888 0.1%
 
-73.84428406 1832 0.1%
 
-73.84431458 1736 < 0.1%
 
-73.8443222 1625 < 0.1%
 
-73.84425354 1567 < 0.1%
 
Other values (36171) 3566207 99.4%
 

Minimum 5 values

ValueCountFrequency (%) 
-122.3996201 1 < 0.1%
 
-115.1502762 1 < 0.1%
 
-84.55180359 1 < 0.1%
 
-83.42976379 1 < 0.1%
 
-81.2538681 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
0 5824 0.2%
 
-70.91651154 1 < 0.1%
 
-70.91851044 1 < 0.1%
 
-70.95729065 1 < 0.1%
 
-71.07021332 1 < 0.1%
 

total_amount
Numeric

Distinct count8196
Unique (%)0.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean14.78220803
Minimum-350
Maximum51192
Zeros (%)0.3%
Mini histogram

Quantile statistics

Minimum-350
5-th percentile5.3
Q17.8
Median11.3
Q318
95-th percentile35
Maximum51192
Range51542
Interquartile range10.2

Descriptive statistics

Standard deviation29.77613526
Coef of variation2.01432257
Kurtosis2431553.483
Mean14.78220803
MAD7.542221629
Skewness1416.592082
Sum53054054.16
Variance886.6182311
Memory size27.4 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[-3.50000e+02 -1.00815e+02 -5.31500e+01 -5.22500e+01 -2.54000e+01 ... 4.78150e+02 5.02165e+02 9.79650e+02 3.69398e+03 5.11920e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
8.3 74120 2.1%
 
7.8 72106 2.0%
 
6.8 72098 2.0%
 
7.3 70554 2.0%
 
8 69240 1.9%
 
7 67851 1.9%
 
6.3 64537 1.8%
 
6.5 61922 1.7%
 
8.8 60677 1.7%
 
6 57694 1.6%
 
Other values (8186) 2918249 81.3%
 

Minimum 5 values

ValueCountFrequency (%) 
-350 1 < 0.1%
 
-300 1 < 0.1%
 
-259.33 1 < 0.1%
 
-250.8 1 < 0.1%
 
-250 2 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
51192 1 < 0.1%
 
4035.46 1 < 0.1%
 
3352.5 1 < 0.1%
 
2665.5 1 < 0.1%
 
1229.8 1 < 0.1%
 

trip_distance
Numeric

Distinct count3614
Unique (%)0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean2.94962022
Minimum0
Maximum439.53
Zeros (%)1.4%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0.48
Q11.1
Median1.99
Q33.78
95-th percentile8.57
Maximum439.53
Range439.53
Interquartile range2.68

Descriptive statistics

Standard deviation2.980152093
Coef of variation1.01035112
Kurtosis244.8738648
Mean2.94962022
MAD2.026236885
Skewness4.863721805
Sum10586328.55
Variance8.881306498
Memory size27.4 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-03 1.50000e-02 2.50000e-02 3.50000e-02 ... 5.28050e+01 6.23700e+01 9.43700e+01 1.43245e+02 4.39530e+02], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 51452 1.4%
 
0.9 40130 1.1%
 
1 39985 1.1%
 
0.8 38819 1.1%
 
1.1 38440 1.1%
 
1.2 36676 1.0%
 
1.3 34837 1.0%
 
0.7 34739 1.0%
 
1.4 32783 0.9%
 
1.5 30834 0.9%
 
Other values (3604) 3210353 89.4%
 

Minimum 5 values

ValueCountFrequency (%) 
0 51452 1.4%
 
0.01 3219 0.1%
 
0.02 2419 0.1%
 
0.03 2126 0.1%
 
0.04 1816 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
439.53 1 < 0.1%
 
375.64 1 < 0.1%
 
250.24 1 < 0.1%
 
176.53 1 < 0.1%
 
146.27 1 < 0.1%
 

Correlations

Missing values

Sample

First rows

dropoff_datetimedropoff_latitudedropoff_longitudepassenger_countpickup_datetimepickup_latitudepickup_longitudetotal_amounttrip_distance
02015-02-01 01:49:5840.728386-73.98476412015-02-01 01:26:4540.811172-73.95354527.808.11
12015-01-02 20:14:0440.711475-73.96157112015-01-02 20:06:2840.714321-73.9467099.801.29
22014-09-27 18:19:5640.777813-73.94730452014-09-27 17:55:3840.718094-73.95762626.306.12
32014-04-27 02:39:0240.718582-73.98778522014-04-27 02:27:0440.713997-73.94950117.303.68
42014-05-26 18:44:1340.664013-73.97732512014-05-26 18:32:1940.672195-73.94409211.502.40
52015-03-04 21:43:4740.812088-73.94400812015-03-04 21:36:4840.804962-73.9548269.361.16
62015-01-21 09:51:0140.656422-73.86508912015-01-21 09:27:4140.730438-73.86200025.807.50
72015-03-07 19:20:4940.635708-74.00929362015-03-07 18:51:5840.675697-73.97194721.304.58
82015-01-11 17:04:2640.684875-73.92327912015-01-11 16:55:0440.681896-73.9498838.801.51
92014-05-30 06:00:0040.781448-73.94917312014-05-30 05:53:1540.789875-73.9523707.501.20

Last rows

dropoff_datetimedropoff_latitudedropoff_longitudepassenger_countpickup_datetimepickup_latitudepickup_longitudetotal_amounttrip_distance
35890382014-05-05 16:56:4840.708706-73.80003432014-05-05 16:51:3840.716911-73.8034596.500.70
35890392015-01-12 21:22:4340.583813-73.93779012015-01-12 21:17:2740.587360-73.9538579.101.00
35890402014-04-23 09:58:1140.789410-73.95270512014-04-23 09:45:4840.805065-73.93965112.381.84
35890412015-04-18 21:09:3340.843395-73.90508312015-04-18 20:57:4640.814789-73.91470311.802.43
35890422014-06-10 21:56:1740.674210-73.96708722014-06-10 21:39:4840.650593-74.00456215.503.50
35890432015-05-30 01:15:5340.680450-73.95655112015-05-30 01:03:5340.680923-73.97739411.301.95
35890442015-06-28 13:07:0640.688084-73.99534612015-06-28 12:55:4840.679119-73.9997029.300.81
35890452015-05-02 09:51:0140.793259-73.95199612015-05-02 09:45:5040.807251-73.9459006.801.19
35890462014-08-11 14:23:5240.823505-73.94135312014-08-11 14:10:0340.795506-73.94185615.002.60
35890472015-06-10 18:27:0040.671173-73.97454122015-06-10 18:21:1540.676311-73.9627617.801.00